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1. INTRODUCTION 

Oversimplification performance for pattern recognition with any classifier can be attained while the 
classification capability is achieved along with a better formulation function. This formulation function must 
apply to the training set for the matching process in which the size is the most favorable factor. Instead of the 
functions, classifiers keep several parameters for doing work position but these parameters change according 
to the situation, so these parameters have moldability, due to this, known as amendable parameters. Based on 
classification capability, these parameters be trained quickly about the training set devoid of any error but 
display a slightly low simplification method [1]. On the contrary, a classifier with non-amendable parameters 
or functions can’t maintain its capability for competent learning at all. Between them, the classifier's finest 
capability for competence adjustment minimizes the estimated simplification error for the training set which 
is provided as an original form. According to the empirical proof with theoretical investigation associated 
with the simplification error for the training set with the help of a classifier, the classifier also resolves the 
complexity issue in learning [2], [3]. Risk minimization error becomes the foundation of the unique classifier 
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that sustains the ability to solve this issue with very high performance for pattern recognition named support 
vector machines (SVMs). 

Nowadays this classifier solves a large number of problems in regression and classification sections 
with high accuracy. The concerning point of SVMs is to explore a hyper-plane within n-dimensional space 
which has particularly categorized the data according to its own selected data points called support vectors 
[4]. SVM belongs to the category of supervised learning that is linked with learning algorithms in which 
analysis of data provides accuracy for classification and regression problems. SVM is a representative of 
training sets (data points) in a bounded region, separated into two classes of data with a reasonable distance 
with the help of support vectors [5]. Thus, with the separation process of SVM, the bounded region is divided 
into two marginal categories with suitable distance while the gap between two separable classes, and support 
vectors should be as wide as possible. The main reason behind the wide range of distance between selective 
support vectors is new prediction [6]. 

SVM performs two types of classification such as linear and non-linear. Linear classification is 
active when data of classes are in labeled form. On the other hand, supervised learning cannot handle the 
non-linear problem of classification. On behalf of this, researchers used a trick which is known as a kernel in 
which the mapping process shifts their inputs into high dimensional feature space, to solve these types of 
problems [7]. On the record, an SVM constructs a hyper-plane into high dimensional space in which 
separated regions with decided boundaries become the reason for the largest distance between selected 
support vectors. The gap between the selected support vectors of two classes is called functional margin. In 
general, when the margin is low then the simplification error of the classifier is also low vice versa [8]. The 
primary issue can be in a predetermined dimensional space in which the data sets are not present in the form 
of a linear set, due to this reason, this space turns into high dimensional space. Reasonable, the mapping 
process of SVM operates a dot product function in which the input data appears in pairs. As a subject, these 
pair vectors can be calculated in terms of variables with their kernel function k(x, y). With the help of a 
conversion scheme, high dimensional space consists of data sets and the points of including these data sets 
perform dot product along a vector such as a set of vectors is minimal. The vectors satisfy the pre-defined 
hyper-planes with their linear parameters «; of classes of feature vector x; database. A point x in the feature 
space which is going to be mapped on hyper-plane, describes the relationship of }}' «; k(x;,x) = 
constant,where k(x,y) become small y extended as compared to x, every measurement of summation is 
grown to be the degree of proximity of test point x to the equivalent database x;. Besides this, kernel 
summation is used for the measurement of the comparative proximity of every test point while this function 
construct or designs the optimal solution for the relative problem with some modifications [9]. The usage of 
SVM can vary according to the domain of the problem and solve a variety of real-world problems such as 
text categorization, image classification, recognition of handwritten characters, and bioinformatics [10]. 


2. RECENT LITERATURE 

This section emphasizes the most related work for other researchers who used SVM for the 
generalization of their research problems. Here we describe some significant research portions that are based 
on the algorithms. The SVM algorithm is used for classifying the data according to the steps in Figure 1. 


Input: I: Input data 

Output: V: Set of support vectors 

Begin 

Step 1: Divide the given dataset into two set of data items having different class labels assigned to them 
Step 2: Add them to support vector set V 

Step 3: Loop the divided n data items 

Step 4: If a data item is not assigned any of the class labels, then add it to set V 

Step 5: Break if insufficient data items are found 

Step 6: End loop 

Step 7: Train using the derived SVM classifier model and test so as to validate over the unlabeled data items 
End 


Figure 1. Steps of SVM for sentiment analysis 


According to Pavlidis et al. [11] research is based on the performance improvement in sentiment 
analysis regarding classification. The proposed methodology presents the combination of SVM and Naive 
Bayes (NB) along with excellent results concerning accuracy metrics. They give two new algorithms for 
sentiment analysis according to the word and sentence level and these algorithms [12] perform preprocessing 
steps on selected datasets and translate unstructured forms of reviews into structured forms furthermore 
structured forms translate into numerical values with the help of the lexicon scoring method. Therefore, the 
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focused area of research in this research is feature selection and semantic analysis. For classification 
purposes, SVM is used with its radial basis kernel function. Whereas, a dataset D = {X;, y;} as X; present the 
set of records with class labelsy;. A separating hyper-plane is usedW*x+ b=0, where,W = 
{W 1, W2, ...,Wy} aS Wy, is a weight vector with n features while b shows biases [13]. To achieve the goal, the 
maximum margin hyper-plane with the help of lagrangian calculation in terms of formula for record testing 
as: in SVM classifier perform operations for accuracy according to the f(x) = NL, «; k(X,X;) +b given 


_xI/2 
formula, where k(X,X;) is the radial basis kernel function along exp (- eT) formulation. Another 


research article presented text classification through cosine similarity with latent semantic indexing for the 
Arabic language. In the reference article [14], the proposed algorithm is based on some useful steps. Table 1 
illustrate the related literature about the usage of SVM with its properties, functions, datasets, algorithm 
modifications, rules, and hybridization to solve different types of problem in the real world [15]. 


Table 1. Different research fields where SVM used for excellent results 


Research areas Count frequency 
Sentiment analysis [16]-[20] 
Text categorization [21]-{23] 
Image classification [(24]-[26] 
Bio-informatics [27]-[30] 
Other fields [31]-[33] 


3. BACKGROUND INFORMATION ABOUT SUPPORT VECTOR MACHINES 

In the past two decades, machine learning progressed in the knowledge domain and performs 
actions as a backbone. It evolves as a pivot point in our daily life. In the word world, data is growing day by 
day then control of the access process to the data availability turns into a problem. The logical motivation to 
consider is that elegant data analysis should be inescapable from the technical development. Machine 
learning divided the outer factors into two major fields such as art and science. In the art field, it tries to 
decrease some contrasting category issues with the help of quite contracted models while in the science field, 
providing solutions to the issues with generalization performance [34], [35]. Intellectual learning is a field of 
processes that describe complicated tasks specifically concerning understandable methods. According to the 
standard definition of learning is “to increase knowledge with high indulgent or expertise through, learning, 
training, and practice”. Therefore, learning about machine systems, mechanisms, infrastructure, approaches, 
and working styles is collectively known as machine learning. Normally, the modifications of a system are 
submitted to the work positions to operate several activities directly connected to artificial intelligence. 
Several categories of activities engross in detection, analysis, arrangement, robot control systems, and 
calculation. The modifications can be relevant to the enhancements or can be new [36]. 

The main conceptual idea about machine learning work can be comprehended in the light of four 
broader perspectives. Firstly, its focal point is prediction while the prediction keeps some background 
knowledge about things. Therefore, that background knowledge is useful in prediction such as based on the 
past, and what they did. Secondly, it searches out the relationship between original data and prediction 
practically. Thirdly, it can exploit a set of stimuli/inputs for prediction with statistical approaches in technical 
form. Lastly, it tries to predict the value of a variable Y that is given an input of feature set X [37]. Machine 
learning is divided into three main categories such as supervised learning, semi-supervised learning, and 
unsupervised learning furthermore, these categories sub-divisions are shown in Figure 2. The study of the 
support vector knowledge is beneficial from two major perspectives. Firstly, the theoretical part is strong 
enough for satisfactory analysis in which the work foundation is pleasant with straightforward initiatives and 
designs with a strong structure. Secondly, the guidance about any experimental application with high 
accomplishment achieved. The intersection between the theoretical portion with experimental support vectors 
becomes an intelligence of work for practical life [38]. Several categories of algorithms along arithmetical 
tests might be classifying knowledge in a particular field and environmental factors support its structure. 
However, the real world needs to use and learn the most multifaceted designs and algorithms just like neural 
networks which are more complex than support vectors in the form of theoretical analysis [39]. 

The SVM predicts the supporting vectors for the separation of two classes with high margins. 
Support vectors perform mathematical operations easily due to their special correspondence regarding a 
linear scheme in a high-dimensional feature space. This feature space is non-linearly associated directly with 
input space. Furthermore, a linear structure in a high dimensional space performs accurate operations in 
experimental applications but the formulation of any mathematical equations cannot be computed in that 
structural space. Therefore, kernels are a type of trick for solving this issue of computation [40]. Kernels 
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achieve any type of formulation with evidence straight in that input structural space. A very nice solution 
provided by a curl of support vectors is kernels. 

Nowadays, researchers focus on non-linear trends for detection, deterioration problems, or mining 
of significant features of useful applications. Support vectors become the root of some branches such as the 
theoretical part of any knowledge, to get the most favorable results for hyper-plane algorithms, the 
computation of kernel operations, and its functional evaluation [41]. Hyper-planes: according to the 
mathematical term, a hyper-plane is divided into sub-spaces where the measurement belongs to its circulating 
spaces. There are some specific learning rules for describing the relationship between the hyper-plane and its 
spaces: if a space is represented in 3D it means its relation with the hyper-plane is in 2D planes while if the 
space performing its operation within 2D then its associated hyper-plane is in 1D line. These rules are 
implemented normally in generic spaces which belong to the idea about the structure of the sub-spaces. 
Hyper-planes help to support vectors for their activities regarding pattern recognition or natural language. 
Figure 3 illustrates many unique categories of spaces used in different types of hyper-planes. Every space has 
some sub-category functions e.g., branch space has norm and completeness as sub-functions [42]. Therefore, 
the presented result of this hyper-plane is going to be a solution of a single linear mathematical form such 
aS A,X, + A2X2+...+AnXp + b = 0. Projective hyper-plane extended the concept of the plane. Sub-division 
of the spaces is not possible in this form but it considers the two hyper-planes for division of the points and 
then space. While it keeps a set of points along its property. For any two points from the set construct a 
learning rule for examination of the rest of all points in the set e.g., a lone hyper-plane in which all the sides 
are connecte [43], [44]. For the structure of the algorithm, a combination of functions is required for a class 
in which capacity may be involved as a primary objective. Support vectors become the foundation of hyper- 
planes equivalent towards the conclusion purpose (w. x) +b = 0 where weR ,beR. This is a simple 
equation for getting optimal hyper-plane within space [40]. Generalization of SVM: machine learning is a 
sub-field of artificial intelligence in which the main research is about the expansion phase of the approaches 
with their suitable schemes whereas the computer learns and is used as a machine. Moreover, the expansion 
phase included algorithms in which the machine learning process is considered its major task with their sub- 
activities. In so many traditions, machine learning has common characteristics among statistical progress. 
With time, various useful approaches with their structural design proposed for machine learning activities 
[45]. 

The history of the SVM algorithm originated in 1963 by Vapnik and in 1992, the first time SVM 
was introduced based on a set of correlated supervised learning methods, and these methods were used for 
classification and regression problems as non-linear classifiers. According to the non-linear classifier, Vapnik 
used kernel tricks for getting maximum margin hyper-planes [46], [47]. SVM is a specific type of 
classification while the working layout depends on the conversion process where the training dataset is 
transformed into a higher dimensional space in which the main purpose is to investigate how to divide 
decision boundaries among classes. According to these boundaries, hyper-planes become useful boundaries 
in which support vectors can be classified rather than other data points with their margin. These margins are 
present in the form of parallel lines which describe through the shortest distance between a hyper-plane and 
its support vectors. Therefore, SVM is capable of classification of both types of datasets such as linear and 
non-linear [48], [49]. SVM roughly sketched for some significant issues [50] that is: class division: 
researchers tried to obtain the optimum extrication solution for hyper-plane among the selected classes 
through maximizing the distance. The distance between decision boundaries and support vectors is called 
margin as shown in Figure 3 and in between these margins, researchers get their optimal hyper-plane. 
Internal representation of classes: selected support vectors reside in the opposite direction of the prominent 
distance effect on the margin and reduce its effect in terms of weight. Non-linear property of classes: there is 
a big issue regarding if the linear separation cannot be found then selected support vectors can expected 
towards higher dimensional space whereas these selected data points used kernel trick while efficiently 
participating in linearly separable operation [51]. Weights of classes: this depends on the particular vector 
which is known as the weight component such as A and B are two different classes with asymmetric class 
sizes with weight vectors. Cross-validation for classification: for training data evaluation, k-fold cross- 
validation is executed through different possible combinations of the parameters in which most of the time 
default values are set for good accuracy. 

Why does the SVM margin is = by using geometry, the margin is probably related to the 
maximum distance between two parallel hyper-planes that can separate the two-point sets; let xo be a point in 
the hyper-plane of wx —b = —1, so this hyper-plane equation becomes wxy — b = —1 [52], [53]. To 
measure the distance between hyper-planes wx —b = —1, and wx—b= +1, there is only a need to 


compute the perpendicular distance from xo to plane wx — b = 1, denoted as r. Therefore, Ta is a unit 


w w 
normal vector of the hyper-plane wx -b=1 >w (xo + r~) —b=1, where (xo + rw) should be a 
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point in hyper-plane, according to the definition of r, equation of hyper-plane becomes as [54]. General 
example: an easy example selected from our daily life. Let’s suppose here a scenario-1 is given for the testing 
approach. Scenario-1: “I have a business and I receive a lot of emails from customers every day. Some of 
these emails are complaints and should be answered very quickly. I would like a way to identify them 
quickly so that I answer these emails in priority”. According to this given scenario [55], we try to provide the 
best optimum solution with the help of a supervised machine learning algorithm with some significant steps. 
Now a supervised learning algorithm e.g., SVM is used for training sessions along the labeled dataset as a 
linear model. So in simple words, the linear model is based on a line that separates the data hyper-plane. 
Some points are described in Figure 4 and these play an important role in the process of separating the line in 
hyper-plane [56], [57]. 


Machine Learning | 


c r Semi-supervised Unsupervised 
acdsee 


Support Bee 

Vector Siena 

Machine ished 

Naive Neural aes rar, 
Co-trainin; Probabilistic 

B iS 

Approaches Approaches Bowanisee 
Decision 
Tree 


Figure 2. Sub-division of machine learning approaches 
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Figure 3. Types of abstract spaces of hyper-planes 


Step-1: Basic need for a lot of emails, the more the better. 

Step-2: Ready the title of each email and classify it by saying “it is a complaint” 
or “it is not a complaint’. It put a label on each email. 

Step-3: Train a model on this dataset. 

Step-4: Assess the quality of the prediction (using cross validation). 

Step-5: This mode is used to predict if an email is complaint of not. 


Figure 4. Learning properties for a line with the separation process 
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There are four major types of SVM used for classification tasks in any field of research. In terms of 
scenario 1, SVM is divided into four categories of classification: i) the original one: the maximum margin 
classifier; ii) the kernelized version using the kernel trick; 111) the soft margin version; and iv) the soft margin 
kernelized version in which 1), 11), and 111) are combined. 

These four types merged into two types of SVM to linear and non-linear SVM. Figure 4 illustrates 
the linear SVM in the best way. Formulation of SVM problem: In this type of SVM, the classifier basic 
element is a separating hyper-plane with the line equation along [+1 -1] interval. The significant points that 
belong to the training sets are considered support vectors which these points have the properties of the 
separating process of hyper-plane. Mathematically, a linear SVM-focused equation is the separation of the 
hyper-plane with its optimum line as (1) to (4) [58]. 


W.Xt+b=+41 (1) 

W.X-+b=-1 (2) 

W.(Xt —X-)=2 (3) 
a (x*-x7).w _ 2 

Me wl [wl ) 


Where +1 belongs to the positive predicted class, -1 belongs to the negative predicted class, w is the weight 
vector and M is the margin width. 

Consequently, the main achievement through linear SVM can be obtained by the formulation 
according to the formulas along their properties as (5) [59]. 


yi(wx; + b) = 1if Vi 


ae ae Siitede Ap : ashi: 
To maximize the margin wr and minimize ;W ew, researchers formulate a quadratic optimization problem 
Ww 


through the given equation with its subject constraint as (6). 
{minimize @(w) — 1/2 w*t w subject to y_i (wx_i+ b) => 1V_i} (6) 


For the proper solution, to solve the optimization problem, to find this statement: 
{ww and b)| min @(w) = -WTw and V {(X,, ¥)}: y;(W7X; + b) = 1} [60]. Therefore, for the optimization 
of a quadratic function, linear constraints become a powerful subject. In mathematical programming 
problems, quadratic optimization problems are a very famous category for optimal solutions. These solutions 
include the structural design which is known as a dual problem in which a Lagrange multiplier named «; is 
linked along each and every constraint within the primary problem [61]. For solving the optimization 
problem, some properties combined in a formulation as (7): 


Q(X) = E Kj - FLY aioe; yiyi XTX) subject to Y x; yi 


Find (&,..., Xy)| max (7) 
the formation of the solution depends on w and D as (8): 
{((w = «; y;X;) and (b = ye — W'X,) for any X;| X,# 0} (8) 


whereas each «; shows that it corresponding X;. This will be a support vector. Based on the category of 
problem, the classifying function will have the form (9) [62]. 


f(X) = «; y:X7 +b (9) 


This function depends on, an inner product between the testing point x and its corresponding sup 
isport vector x;. Calculation of the inner products X7 X ; between all pairs of points of the training set involved 
in the solution of the optimization problem. As per instructions, if noise detects in the training set then slack 
variables €; can be added in the function and allow misclassification of difficult data. After adding the slack 
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variable in the quadratic optimization problem, the (10) will become next, and the representation is shown in 
Figure 5. 


{min Q(x) = swt. W + Char Sk VIAL YD) yi(W'X; + b) = 1— §, and § = 0 vi} a 


Where C indicates the capacity parameter, €; represents parameters for handling the non-separable points, 
and index i shows the labels of N training sets [63]. 

If the dataset is just too hard more shown in Figure 5 then convert this dataset into higher 
dimensional space. The linear classifier relies on a dot product vector which is K (Xx. iX, i): Therefore, every 
single data point is mapped into higher dimensional space through a transformation process like @:X > 
~(X) where a kernel function corresponds to an inner product within the expanded feature space. If data is 
not linearly separable, there is a function used for the transformation process to convert this data into higher 
dimensional space [64]. So, the data goes from 1D to 2D for representation of the data. Based on the function 
f(x), conversion of the dataset from 2D space to 2D feature dimensional space is easy. Now the only problem 
with transformation into higher dimensional feature space is that it’s computationally expensive. Based on 
expensive calculations, researchers use a kernel trick to reduce the computational process. A function that 
takes as an input vector in the original space and returns the dot product of the vectors in the feature space is 
called a kernel function also referred to as kernel trick [65]. Using a kernel function, researchers can apply 
the dot product within the two vectors, so that every point is mapped to a higher dimensional space via some 
transformation. Therefore, essentially various scholars use this kernel trick to transform a non-linear space 
into a linear space. Some popular kernel tricks are used to transform the data into higher dimensional feature 
space [66]. 

— Linear kernel function, K(X, X;) =X? Xj 
— Polynomial kernel function, K(X;,X;) = (1 + eo). 


202 


2 
— Radial basis function (RBF), K(X;,X;) = exp (- bot) 
—  Sigmoid kernel function, K(X;,X;) = tanh(BoX7.X; + B1) 


Class2 - 
; Yy(<W,X,>+b)= 41 


A 4 
Re 38 
~~ 4 


§, (Slack variables) 
\7 y, (< Ww, x, >+b)=-1 
y 


Class | 


Margin y y 
: ’ 
Support Vectors 


Figure 5. Linearly separable hyper-plane with slack variables 


Unfortunately, choosing the correct kernel is a non-trivial task or maybe an unspecific task at hand, 
no matter which kernel researchers choose according to their problem. There are some properties used during 
the selection of kernel function e.g., the need to tune the kernel program to get good performance for the 
classifier. A famous program the researchers need to tune includes k-fold cross-validation [57]. Mercer’s 
theorem: according to machine learning [67], kernel function scheme is a most famous trick that is a much 
near type of Mercer in which a variety of issues such as regression, classification, and inverse issues 
regarding optimization can be resolved competently. Kernel functions are linked along feature mapping due 
to its mapping procedure where the dataset is mapped from the original space to higher dimensional feature 
space. A common assumption where an input space X is mapped via a feature mapping {®: X > H|(x,y) € 
X}, where K(x, y) (@(x), @(y))H. In Figure 6, this theorem performance steps described in detail with useful 
equations and matrices [46]. 
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Input: Let us denote #2 as the threshold value for Rsv (p) selection in algorithm 
step (b) and th as the threshold to select Rsvas (p) in algorithm step (c). 
Rsva (SVM sent .p) : Set of SVM results obtained after performing SVM 


classification, SVM sen: present the sentiments , p is the probability of sentence 
classification, 

Rys (NBsent .v) : Set of NB classification results obtained after performing NB 
classification, 

NBzent indicate sentiment: V-NB results value, contain “1°’ for positive sentence 
and “-1°" for negative sentence. 


Th3-min(Rsv (p)) 


Figure 6. Stepwise Mercer's theorem 


Where dual problem formulation for non-linear SVMs is described in (11) and (12) [68]. 
{? iy» Ay) |max(Q(a) = Ya; — 1/2XY ayajyiy;k(X,X;)), LV aiyi = 0and a; > 0,va;} (11) 


f(x) = Y ayyik(X;,X;) +b (12) 


4. RESULT 

In this section, we present the different modified sections that took place in SVM by different 
research in different years. Table 2 (see in Appendix) presents the different categories of the result like the 
year, function, improvement, and objective of many researchers’ modification in SVM for their research 
purpose [69]—[79]. In this research, we collect all the information related to SVM. Consequently, Table 3 
shows some properties of SVM related to the applications, pros, and cons, and mentions in which section 
they are modified and for what purpose they are used all the data are mentioned in the given table where 
different research can be used. 


Table 3. SVMs pros, cons, and applications 


Parameter Value 
Pros: some advantages of — They are effective in high dimensional spaces; they are still effective in cases where ‘R’ is 
SVMs are: the number of dimensions greater than the number of samples. According to the notations 


representation is: > R°N > samples. 

— They used a subset of training points in the decision function or support vectors. So, it’s 
also memory efficient. 

— Support vectors are first it all. So different kernels can be specified for the decision 
functions. Common kernels are provided but it is also possible to specify the custom 
kernels. The addition of kernel functions together to achieve even more complex hyper- 
planes as K1 + K2 = complex hyper — planes, where K1 and K2 represents the first and 
second kernel functions respectively [79]. 

Cons: the disadvantages of — SVM does not directly provide supervised probability estimates. 
SVMs include: — These are calculated using an expensive five-fold cross-validation. 

— If the number of features is greater than the number of samples then the method is likely to 

give poor performance. 


Application: it can be a quite -— Medical imaging 

popular alternative to — Regression model to study the air quality in the urban areas 
artificial neural networks. — Image interpolation 

Some useful applications — Medical diagnosis task 

where SVMs used for good — Time series prediction as well as financial analysis 


performance are: — Encoding theory with practice 


— Pattern recognition 
— Page ranking algorithm 
— Text and object recognition 


5. DISCUSSION 

The optimization issues were resolved with the help of a support vector along with analytical 
analysis. Some conditions were applied to the selected dataset for further processing. Especially for small 
training datasets or the linear case, it is very critical to know which of the training datasets become support 
vectors. This situation can happen when the issue is relevant to the symmetry case. Generally, the worst-case 
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computational complexity can happen during analytical analysis. Therefore, linear and non-linear SVM 
formulation provides the right path for the solution numerically and analytically. On the other hand, a range 
of approaches are used for the larger problems. Here in this article, we just describe the general research 
areas with the researcher’s efforts and the generalization of the SVMs formulations for linear and non-linear 
problems. Some properties such as complexity, scalability, and parallelizability of SVMs play significant 
roles in the processing. The dataset of training and testing functions depends on the kernel functions although 
it corresponds to a dot product in higher dimensional space. There are some turning points for the SVMs 
where machine learning goes deep knowledge and requires more research such as the choice of the best 
kernel according to the dataset, processing speed, dataset size in terms of training and testing phases, 
rescaling, and optimal design for multi-class problems. According to the RBF kernel function, classifiers will 
automatically give values for the RBF weights, number of centers, center positions, and threshold. 


6. CONCLUSION 

This article provides a detailed description of the concept of linear and non-linear SVM for multiple 
areas of research such as sentiment analysis, text classification or categorization, image classification, and 
bio-informatics. It gives both hypothetical and mathematical verifications that SVMs show this is a very 
appropriate option for multiple fields. The hypothetical analysis concludes that SVMs accept the specific 
properties for completing the given tasks. The mathematical generalization shows the influence of SVMs on 
consistently achieving good performance, and advancements in existing methods considerably and drastically 
outperforming. In this paper, the authors also present the kernel trick idea of SVM in multiple fields with 
their four main functions linear, polynomial, RBF, and sigmoid functions. All under-reviewed articles show 
that the best kernel functions selection provides excellent results with high effects of accuracy. We analyze 
the field areas of the existing linear and non-linear SVMs with its kernel trick along selection of the function 
and parameters settings. We also analyze the concept that is related to the classical procedure of SVM for 
training the weights, improved training, and testing error rates through soft margin. The improvement in 
testing error or risk was exclusively the reason for the lower value of the training error or empirical risk. All 
this makes SVMs a very promising and easy-to-use method for learning multiple field classifiers from the 
given or selected examples. 
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